04 June 2024

Batch Load

Introduction

GigaSpaces now has the ability for Smart DIH Smart DIH allows enterprises to develop and deploy digital services in an agile manner, without disturbing core business applications. This is achieved by creating an event-driven, highly performing, efficient and available replica of the data from multiple systems and applications, to define batch loads via a standard pipeline interface. Batch load can now be performed without the use of IIDR IBM Infosphere Data Replication. This is a solution to efficiently capture and replicate data, and changes made to the data in real-time from various data sources, including mainframes, and streams them to target systems. For example, used to move data from databases to the In-Memory Data Grid. It is used for Continuous Data Capture (CDC) to keep data synchronized across environments..

Full batch load can now be performed as a "pull" functionality without requiring the use of "push" functionality used by a typical IIDR deployment.
Can be managed by SpaceDeck GigaSpaces intuitive, streamlined user interface to set up, manage and control their environment. Using SpaceDeck, users can define the tools to bring legacy System of Record (SoR) databases into the in-memory data grid that is the core of the GigaSpaces system. or via REST API REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style..
Support for tables with materialized views (executed query and results saved in a table) and views (saving the actual query – as SQL). As there is no primary key for those table types, a Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. ID must be defined.
There is support for direct JDBC Java DataBase Connectivity. This is an application programming interface (API) for the Java programming language, which defines how a client may access a database. connection to a data source

Configuring Batch Load: Helm

Enabling

Batch load is enabled through Kubernetes An open-source container orchestration system for automating software deployment, scaling, and management of containerized applications. orchestration This is the automated configuration, management, and coordination of computer systems, applications, and services. Orchestration strings together multiple tasks in order to execute and easily manage a larger workflow or process. These processes can consist of multiple complex tasks that are automated and can involve multiple systems. Kubernetes, used by GigaSpaces, is a popular open source platform for container orchestration.. It is not enabled by default.

The following flag has to be added to the helm command: global.batchload.enabled=true.

Adding the Agent

For each data source created, a separate Batch Load agent must be installed. GigaSpaces also have a separate helm chart in order to install a batch load agent outside of the umbrella. This would be used for the case where a client requires more than one agent. For example, if there are multiple Oracle databases.

To install an agent under the DIH Digital Integration Hub. An application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric. umbrella: global.batchload-agent.enabled=true

For installing an agent and controlling its name: global.batchload-agent.agent.name=[name of agent].
It is also possible to install the batch load agent outside of the helm umbrella. This would be used in the case of a client needing more than one agent (for example, for multiple Oracle databases): helm install di-agent [dih repo name]/di-agents --version 2.0.0 --set agent.name=[name of agent]

Supported Data Source and Loading Types.

Currently, GigaSpaces supports the ability to perform full batch load from an Oracle DB. More data sources and loading types will be added in future releases.

Creating a Data Source for Batch Load

Batch Load cannot be configured for a pipeline that is configured and running with CDC Change Data Capture. A technology that identifies and captures changes made to data in a database, enabling real-time data integration and synchronization between systems. Primarily used for data that is frequently updated, such as user transactions. (IIDR). To enable Batch Load the appropriate configuration must be used when creating the Data Source.

To use Batch load when creating a Pipeline, add a new Pipeline by following steps as outlined in the User Guide: SpaceDeck - Spaces - Adding a Pipeline for Batch Load

User Flows: Creating a Pipeline using Batch Load

Batch Load cannot be configured for a pipeline that is configured and running with CDC (IIDR). To enable Batch Load a new pipeline has to be created.

Oracle Database: Define Basic Full Batch Load Pipeline

Login to SpaceDeck
Define Oracle as the Data Source with the connector type = BATCHLOAD
Create new pipeline.

Full batch load ends after the full load is completed. The status should be Completed. This differs from a CDC pipeline.